Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Pattern-Based Approach to Table Extraction

Identifieur interne : 001520 ( Main/Exploration ); précédent : 001519; suivant : 001521

Pattern-Based Approach to Table Extraction

Auteurs : K. C. Santosh [France] ; Abdel Belaïd [France]

Source :

RBID : ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0

Abstract

Abstract: In this paper, we address a client-driven approach to automatically extract information content within the table in document images. We start with a graph-based representation of a set of key-fields selected by clients and perform graph mining in a document in order to learn them to produce a model. Such models are aimed to use to extract information content in the absence of clients. To avoid NP-hard general problem, our graph matching is based on relation assignment to see whether pairs of nodes are semantically identical. We have validated the concept by using a real-world industrial problem.

Url:
DOI: 10.1007/978-3-642-38628-2_91


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Pattern-Based Approach to Table Extraction</title>
<author>
<name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
</author>
<author>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0</idno>
<date when="2013" year="2013">2013</date>
<idno type="doi">10.1007/978-3-642-38628-2_91</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-6BW59TPP-8/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003A95</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">003A95</idno>
<idno type="wicri:Area/Istex/Curation">003A51</idno>
<idno type="wicri:Area/Istex/Checkpoint">000146</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000146</idno>
<idno type="wicri:doubleKey">0302-9743:2013:Santosh K:pattern:based:approach</idno>
<idno type="wicri:Area/Main/Merge">001532</idno>
<idno type="wicri:Area/Main/Curation">001520</idno>
<idno type="wicri:Area/Main/Exploration">001520</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Pattern-Based Approach to Table Extraction</title>
<author>
<name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Nancy</settlement>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Nancy</settlement>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In this paper, we address a client-driven approach to automatically extract information content within the table in document images. We start with a graph-based representation of a set of key-fields selected by clients and perform graph mining in a document in order to learn them to produce a model. Such models are aimed to use to extract information content in the absence of clients. To avoid NP-hard general problem, our graph matching is based on relation assignment to see whether pairs of nodes are semantically identical. We have validated the concept by using a real-world industrial problem.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement>
<li>Nancy</li>
</settlement>
<orgName>
<li>Centre national de la recherche scientifique</li>
<li>Institut national de recherche en informatique et en automatique</li>
<li>Laboratoire lorrain de recherche en informatique et ses applications</li>
<li>Université de Lorraine</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Grand Est">
<name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
</region>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001520 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001520 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0
   |texte=   Pattern-Based Approach to Table Extraction
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022